conditional gradient and gradient update
Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates
In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization. Specifically, we propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. Furthermore, under a structural sparsity assumption, we first illustrate an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step-size. Next, we propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality.
Reviews: Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates
Updated comments: I carefully read the authors' response and the paper again. I was mistaken when I read Algorithms 1 (Eq(2.3)) The authors control the variance by simply averaging a batch of unbiased stochastic gradients. In this paper, the authors considered the problem of zeroth-order (non-)convex stochastic optimization via conditional gradient and gradient methods. However, all the techniques are already known but none are mentioned in the paper.
Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates
Balasubramanian, Krishnakumar, Ghadimi, Saeed
In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization. Specifically, we propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. Furthermore, under a structural sparsity assumption, we first illustrate an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the step-size. Next, we propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate of convergence depends only poly-logarithmically on the dimensionality. Papers published at the Neural Information Processing Systems Conference.